How Far Are We From (Semi-)Automatic Of Anaphoric Links In Corpora?
نویسنده
چکیده
The paper raises for discussion a proposal for the semi-automatic annotation of pronoun-antecedent pairs in corpora. The proposal is based on robust knowledge-poor pronoun resolution followed by post-editing. The paper is structured as follows. The introduction comments on the fact that automatic identification of referential links in corpora has lagged behind in comparison with similar lexical, syntactical and even semantic tasks. The second section of the paper outlines the author's practical and robust knowledge-based approach to pronoun resolution which will subsequently be put forward as the core of a larger architecture proposed for the automatic tagging of referential links. Section 3 briefly presents other related knowledge-poor approaches, while section 4 discusses the limitations and advantages of the practical approach. The main argument of the paper is to be found in section 5, where we present the idea of developing a semi-automatic environment for annotating anaphoric links and outline the components of such a program. Finally, the conclusion looks at the anticipated success rate of the approach.
منابع مشابه
Arabic anaphora resolution: corpora annotation with coreferential links
Annotated resources are much needed for evaluation and training of anaphora resolution systems. The coreferential chain annotation is a difficult task which can not be realised without an appropriate tool. In this paper, we present our work on Arabic corpora annotation with anaphoric links (i.e., the annotation of the identity relation between the anaphors and their antecedents). In particular,...
متن کاملBuilding annotated resources for automatic text summarisation
Annotated corpora are necessary for automatic summarisation, but given how difficult is to produce them there are only few available. This paper presents an annotation tool which helps the human annotator to select the important units from a text. In addition to the tool, a new annotation scheme is proposed so that phenomena which such as presence of anaphoric expressions and redundancy can be ...
متن کاملComparison of Annotating Methods for Named Entity Corpora
We compared two methods to annotate a corpus via non-expert annotators for named entity (NE) recognition task, which are (1) revising the results of the existing NE recognizer and (2) annotating NEs only by hand. We investigated the annotation time, the degrees of agreement, and the performances based on the gold standard. As we have two annotators for one file of each method, we evaluated the ...
متن کاملAutomatic measurement of instantaneous changes in the walls of carotid artery with sequential ultrasound images
Introduction: This study presents a computerized analyzing method for detection of instantaneous changes of far and near walls of the common carotid artery in sequential ultrasound images by applying the maximum gradient algorithm. Maximum gradient was modified and some characteristics were added from the dynamic programming algorithm for our applications. Methods: The algorithm was evaluat...
متن کاملHow Far Behind Are the South Asian Countries in Relation to East Asian
We define as South Asian countries those countries that start with Iran and end with Bangladesh in Asia. Wethen use export statistics in terms of revealed comparative advantage (RCA) for 14 industrial sectors tomeasure distances of export capabilities for these countries in relation to the âWesternâ developed and EastAsian countries. Statistical methods such as multidimensional scaling and ...
متن کامل